Channel: PyData
Category: Science & Technology
Tags: pythonlearn to codeeducationsoftwarepydatalearncodinghow to programjuliaopensourcescientific programmingnumfocuspython 3tutorial
Description: Computations as Assets - a New Approach to Reproducibility and Transparency Speakers: Anders Berkeman, Carl Drougge, Sofia Hörberg Summary The ExAx open source project from eBay provides reproducibility, transparency, and fast parallel processing in Python. This talk will show how reproducibility by design actually leads to a simpler and faster development process. To make our examples more interesting, we will use relatively large datasets such as those from NYC Taxi and Backblaze. Description Have you ever been struggling to keep order in an increasing set of temporary data files, holding intermediate results corresponding to different steps and versions of the processing? Have you ever been in a situation where you have tried playing with parameters and small code changes to see how it affects the program's performance, and then cannot remember exactly which setting that performed the best? Have you ever re-run your complete processing pipeline just to "make sure" that the graph you are looking at actually comes from the data and source code you thought it did? The ExAx project is designed to avoid problems like these. It treats computations as assets, tagging computed results with links to input data and source code, and stores them permanently on disk in a way that can be easily looked up and retrieved later. In addition, ExAx provides a simple way to parallel process large datasets in Python on a single computer. ExAx is open source from eBay. It runs on anything from laptops to rack servers, it can be used in a production environment with multiple users, and it can easily handle datasets with tens of billions of rows. Anders Berkeman's Bio Anders Berkeman has been working with data science and data processing architectures for the last ten years. He is a Master Researcher at Ericsson Research, and a frequent teacher in machine learning, and data science at Ericsson. He holds a Ph.D. in Applied Electronics. His previous positions include Data Science Architect at eBay, Chief Architect at Expertmaker (acquired by eBay), Chip designer at ARM, and Telecommunication algorithm developer at Cambridge Silicon Radio. Dr. Berkeman initiated the ExAx project in 2012, and he has been actively involved in the project since then. GitHub: github.com/berkeman LinkedIn: linkedin.com/in/andersberkeman Website: expertmakeraccelerator.org Carl Drougge's Bio Carl Drougge is a senior software developer with a wide experience, including high level Python programming as well as design of efficient low level libraries in C or assembler. He is a Researcher at Ericsson Research. His previous positions include Software Engineer at eBay and software guru at Expertmaker (acquired by eBay). Carl Drougge is the maintainer of the Acclerator project, which he has been involved in since the start. GitHub: github.com/drougge Sofia Hörberg's Bio LinkedIn: linkedin.com/in/sofia-h%C3%B6rberg-95364076 PyData Global 2021 Website: pydata.org/global2021 LinkedIn: linkedin.com/company/pydata-global Twitter: twitter.com/PyData pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details. Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: github.com/numfocus/YouTubeVideoTimestamps